NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Generating Test Databases for Database-Backed Applications

https://doi.org/10.1109/ICSE48619.2023.00173

Yan, Cong; Nath, Suman; Lu, Shan (May 2023, 45th International Conference on Software Engineering (ICSE))

Full Text Available
SlabCity: Whole-Query Optimization Using Program Synthesis

https://doi.org/10.14778/3611479.3611515

Dong, Rui; Liu, Jie; Zhu, Yuxuan; Yan, Cong; Mozafari, Barzan; Wang, Xinyu (July 2023, Proceedings of the VLDB Endowment)

Query rewriting is often a prerequisite for effective query optimization, particularly for poorly-written queries. Prior work on query rewriting has relied on a set of "rules" based on syntactic pattern-matching. Whether relying on manual rules or auto-generated ones, rule-based query rewriters are inherently limited in their ability to handle new query patterns. Their success is limited by the quality and quantity of the rules provided to them. To our knowledge, we present the first synthesis-based query rewriting technique, SlabCity, capable of whole-query optimization without relying on any rewrite rules. SlabCity directly searches the space of SQL queries using a novel query synthesis algorithm that leverages a new concept called query dataflows. We evaluate SlabCity on four workloads, including a newly curated benchmark with more than 1000 real-life queries. We show that not only can SlabCity optimize more queries than state-of-the-art query rewriting techniques, but interestingly, it also leads to queries that are significantly faster than those generated by rule-based systems.
more » « less
Full Text Available
Towards Auto-Generated Data Systems

https://doi.org/10.14778/3611540.3611635

Cheung, Alvin; Ahmad, Maaz Bin; Haynes, Brandon; Kittivorawong, Chanwut; Laddad, Shadaj; Liu, Xiaoxuan; Wang, Chenglong; Yan, Cong (August 2023, Proceedings of the VLDB Endowment)

After decades of progress, database management systems (DBMSs) are now the backbones of many data applications that we interact with on a daily basis. Yet, with the emergence of new data types and hardware, building and optimizing new data systems remain as difficult as the heyday of relational databases. In this paper, we summarize our work towards automating the building and optimization of data systems. Drawing from our own experience, we further argue that any automation technique must address three aspects: user specification, code generation, and result validation. We conclude by discussing a case study using videos data processing, along with opportunities for future research towards designing data systems that are automatically generated.
more » « less
Full Text Available
Leveraging Application Data Constraints to Optimize Database-Backed Web Applications

https://doi.org/10.14778/3583140.3583141

Liu, Xiaoxuan; Wang, Shuxian; Sun, Mengzhu; Pan, Sicheng; Li, Ge; Jha, Siddharth; Yan, Cong; Yang, Junwen; Lu, Shan; Cheung, Alvin (February 2023, Proceedings of the VLDB Endowment)

Exploiting the relationships among data is a classical query optimization technique. As persistent data is increasingly being created and maintained programmatically, prior work that infers data relationships from data statistics misses an important opportunity. We present Coco, the first tool that identifies data relationships by analyzing database-backed applications. Once identified, Coco leverages the constraints to optimize the application's physical design and query execution. Instead of developing a fixed set of predefined rewriting rules, Coco employs an enumerate-test-verify technique to automatically exploit the discovered data constraints to improve query execution. Each resulting rewrite is provably equivalent to the original query. Using 14 real-world web applications, our experiments show that Coco can discover numerous data constraints from code analysis and improve real-world application performance significantly.
more » « less
Full Text Available
HBO1 catalyzes lysine benzoylation in mammalian cells

https://doi.org/10.1016/j.isci.2022.105443

Tan, Doudou; Wei, Wei; Han, Zhen; Ren, Xuelian; Yan, Cong; Qi, Shankang; Song, Xiaohan; Zheng, Y. George; Wong, Jiemin; Huang, He (November 2022, iScience)

Full Text Available
Generating application-specific data layouts for in-memory databases

https://doi.org/10.14778/3342263.3342630

Yan, Cong; Cheung, Alvin (July 2019, Proceedings of the VLDB Endowment)

Full Text Available
How not to structure your database-backed web applications: a study of performance bugs in the wild

https://doi.org/10.1145/3180155.3180194

Yang, Junwen; Subramaniam, Pranav; Lu, Shan; Yan, Cong; Cheung, Alvin (January 2018, ICSE '18 Proceedings of the 40th International Conference on Software Engineering)

Many web applications use databases for persistent data storage, and using Object Relational Mapping (ORM) frameworks is a common way to develop such database-backed web applications. Unfortunately, developing efficient ORM applications is challenging, as the ORM framework hides the underlying database query generation and execution. This problem is becoming more severe as these applications need to process an increasingly large amount of persistent data. Recent research has targeted specific aspects of performance problems in ORM applications. However, there has not been any systematic study to identify common performance anti-patterns in real-world such applications, how they affect resulting application performance, and remedies for them. In this paper, we try to answer these questions through a comprehensive study of 12 representative real-world ORM applications. We generalize 9 ORM performance anti-patterns from more than 200 performance issues that we obtain by studying their bug-tracking systems and profiling their latest versions. To prove our point, we manually fix 64 performance issues in their latest versions and obtain a median speedup of 2× (and up to 39× max) with fewer than 5 lines of code change in most cases. Many of the issues we found have been confirmed by developers, and we have implemented ways to identify other code fragments with similar issues as well.
more » « less
Full Text Available

Search for: All records